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Abstract. We are investigating how to program robots so that they learn tasks from 
practice. One method, task-level learning, provides advantages over simply perfect- 
ing models of the robot's lower level systems. Task-level learning can compensate 
for the structural modeling errors of the robot's lower level control systems and can 
speed up the learning process by reducing the degrees of freedom of the models to be 
learned. We demonstrate two general learning procedures — fixed-model learning and 
refined-model learning — on a ball-throwing robot system. Both learning approaches 
refine the task command based on the performance error of the system, while they 
ignore the intermediate variables separating the lower level systems. We provide both 
experimental and theoretical evidence that task-level learning can improve a robot's 
performance of a task. 
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1 Introduction 

We are investigating how to improve a robot's performance of a task. One popular 
approach is to calibrate the models that describe a robot's lower level systems — 
dynamics, kinematics, and perception. We are examining a different approach in 
which the robot system as a whole practices the desired task and learns from its 
experience. After each attempt at the task, a learning procedure is used to interpret 
the errors in task performance in order to adjust the commands that drive the robot. 

Task-level learning can proceed when the robot practices the task. The robot 
performs a task using its control system to generate appropriate commands, and then 
monitors its own performance with its perceptual apparatus. After each attempt 
at the task, a learning robot can use a model of the task to interpret the error in 
performance. With this information the robot can modify its original commands in 
order to better achieve the task on the next trial. The robot then performs the task 
again with a refined command, monitors the new results, and attempts the task once 
again if the performance goals are still not satisfied. The robot continues this sequence 
of performing the task, monitoring performance, and refining commands until the task 
is achieved. 

We have demonstrated this process of task-level learning with a ball-throwing robot 
system that improves its performance with practice (Figure 1). Given an estimate of 
the target, the system uses a ballistic model, a kinematic model, and a dynamics 
model to calculate a desired trajectory for the robot arm. The robot then throws the 
ball at the target. A vision system measures where the ball lands with respect to the 
target. Based on the error in performance, the system applies a task-level learning 
procedure to modify the desired trajectory of the robot arm. The robot system then 
throws the ball again with this updated trajectory. 

The task- level learning procedure modifies the system command based on the error 
in task performance. We have developed two learning algorithms — fixed model learn- 
ing and refined model learning — that were implemented on the ball-throwing robot 
system. Fixed-model learning modifies the system command by applying an inverse 
model of the system to the performance error. Refined-model learning is an interpo- 
lation procedure that refines the system command and improves the inverse model. 

Applying learning at the task level is a promising approach for making robots 
perform a wide range of common tasks. Learning provides a method of compensating 
for the structural modeling errors of the robot's lower level systems. Learning can also 
be applied at the task level to improve the performance of the entire robot system 
without focusing on each lower level system. In this paper we describe two task- 
level learning algorithms, demonstrate their effectiveness on a ball-throwing task, and 
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Figure 1: Ball-Throwing Robot System 
discuss the issues involved in applying these learning principles to other dynamic tasks. 



1.1 Modeling and Learning 

The robot system performs the task of throwing a ball at the target. The ball-throwing 
robot can be considered an input/output system (Figure 2) driven by input vari- 
ables (c) and responding with outputs variables (p) 



P = S(c) 



(1) 



The system is denoted by S(), the system inputs are termed commands, and the system 
outputs are termed performance. Figure 2 also shows that the lower level modules of 
the ball-throwing system — ballistics, kinematics, and dynamics — are concatenated to 
form the system S(). 

An inverse model of the system is used to choose commands c that will yield the 
desired task performance p^ 

c = S-^pd) (2) 

A caret ( A ) denotes the model of the system. Figure 3 shows the block diagram of the 
inverse model, as well as the block diagram that includes the lower level modules. 
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Figure 3: Inverse Model 



Driving the ball-throwing system and the ball throwing model with the same com- 
mand c will generally cause different results. The command c to the system will 
produce the actual performance which we label p, while the command c to the model 
will produce the desired performance pd (Figure 4). The difference between actual 
performance and desired performance is termed the performance error. The perfor- 
mance error is not zero because the model only approximates the actual system. 

The performance error can be reduced either by improving the model of the system 
or by refining the command c. With a better model, the discrepancy between actual 
and desired performance would be smaller. With a learning procedure, the perfor- 
mance error can be used to refine the command c, thereby improving performance on 
successive iterations. Using both approaches together will more effectively improve 
performance. 
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1.2 Relation to Previous Work 

We will briefly survey the fields of robot calibration and trajectory learning as they ap- 
ply to robot learning and robot functionality. Calibration research focuses on modeling 
robots more accurately so that they can perform a wider range of tasks. Trajectory 
learning research emphasizes learning from practice as a way to improve trajectory 
following performance. 

One way to increase robot functionality is to control the robot using better models. 
A good deal of research in robotics has thus been directed towards accurately modeling 
the lower level modules of robot systems. With these calibrated models, researchers 
are able to command robots to perform a wider range of tasks. We will thus discuss the 
advantages and disadvantages of calibration as a way to increase robot functionality. 
In particular, we will compare the calibration approach to learning from practice. 

Much recent work has concentrated on accurately calibrating the kinematic and 
dynamic models of robots. In kinematic calibration, for example, the actual Denavit- 
Hartenberg parameters are estimated based on experimental motions of the robot 
[Hayati and Mirmirani 1985; Wu 1984]. This kinematic model is often extended (and 
complicated) to include many non-linear effects that the Denavit-Hartenberg model 
neglects [Whitney et al. 1986]. Much work has also been done in the area of dynamic 
calibration, where the the actual masses and inertias of the links and loads are esti- 
mated based on trial motions of the robot [Atkeson et al. 1986; Mayeda et al. 1984; 
Mukerjee 1984; Neuman and Khosia 1985; Olsen and Bekey 1985]. 

Accurately modeling the lower level modules of a robot system has several ad- 
vantages. When the structure of the model is chosen correctly and the parameters 
are estimated correctly, the model is valid for any inputs and outputs. The experi- 
ence gained in selected trials generalizes to the entire range of operation. As a result, 



learning does not need to take place every time the model is used. Additionally, a 
structured model provides a compact method of representing the input/ output behav- 
ior of a system. Data need not be stored for every potential scenario, but the model 
can instead be evaluated when necessary. 

Attempts at improving models of the robot at the expense of introducing learning 
procedures also have several shortcomings. First, the number and range of robot 
motions necessary to fully estimate the model parameters is often large. Making trial 
motions that actually attempt the task may be a more efficient method of reaching 
the task goal. Second, no matter how well the parameters are estimated, the models 
are often based on structural assumptions that are not valid in practice. The Newton- 
Euler model of dynamics, for example, is based on rigid body dynamics which typically 
is a good but not perfect description of robots. Compensating for the structural 
assumptions of the model requires a great deal of data, time, and ingenuity. Even 
after compensation some structural modeling errors remain. 

Robot learning research has focused in the past on the trajectory following sub- 
task [Arimoto et al. 1985; Casalino and Gambardella 1986; Craig 1984; Furuta and 
Yamakita 1986; Hara et al. 1985; Harokopos 1986; Mita and Kato 1985; Morita 1986, 
Togai and Yamano 1986; Uchiyama 1978; Wang 1984; Wang and Horowitz 1985]. 
Robots are made to follow a particular trajectory better as they repeat the move- 
ment. Feedforward torque commands for repetitive movements are refined on the 
basis of previous movement errors. This research has focused primarily on linear 
learning operators which often ignore the underlying model of the system. The work 
has also emphasized the stability and not the performance of the proposed algorithms. 

We will explore the advantages of using the inverse model as the learning oper- 
ator and how to apply learning algorithms at the task level. Atkeson and Mclntyre 
[1986] began to explore fixed-model learning for the trajectory following subtask. The 
research shows that using the inverse Newton-Euler model as the learning operator 
reduces most of the movement errors. The theoretical convergence criteria and perfor- 
mance for the learning algorithm was then derived [Atkeson et al. 1987]. The success 
of the algorithm at the task level can now be assessed. The learning algorithm can 
be applied beyond a single lower level module — the dynamics component learned in 
trajectory following — to a series of lower level systems that together perform a task. 
The theoretical predictions of convergence and performance at the task level can then 
be confirmed experimentally. 



2 The Ball Throwing Task 

The task is to throw a ball at a target. Figure 1 illustrates the robot system that is 
configured to accomplish this task. The system includes the MIT Serial Link Direct 
Drive Arm [Asada and Youcef-Toumi 1987], a target plate, and a video camera. The 
height of the ball when it hits the target plate is monitored and improved by learning. 

The last link of the robot is used to throw the ball at the target plate. The robot 
is positioned so that the last link of the arm rotates in a vertical plane. The robot 
swings this link through a 180° arc from 225° to 45°, catapulting the ball to the target. 
The joint is servoed to a fifth order polynomial trajectory that accelerates the arm 
from rest until it reaches 135° and then decelerates the arm to rest at 45°. A 4cm 
rubber ball is placed into a 3.5cm diameter hole at the end of the third link of the 
robot. The ball leaves the hole as the robot arm decelerates during the throw. No 
release mechanism is used, but the release position is assumed to be when the last link 
is at 135°. The ball is thrown further by shortening the duration of the throw, which 
increases the release velocity (the trajectory length is held constant). This trajectory 
duration is the system command c that is refined by the learning procedure to achieve 
the task. 

A video camera records where the ball hits the target plate. The impact of the ball 
is sensed by a force sensor on which the target plate is mounted. This signal is used to 
choose the video frames to be stored for later analysis. After the throw, the location 
of the ball on the target plate is manually measured from the appropriate video frame. 
This location serves to measure the system performance p in accomplishing the task. 

The initial trajectory duration command and the arm trajectory itself are calcu- 
lated based on the measured distance between robot and target. A simple ballistic 
model, including only gravity, is used to represent the flight of the ball 

y d = vsin(0 rel )-t--g-t 2 (3) 

x d = v cos(0 re i) • t (4) 

For a given position of the target (xa, yd) a nd release angle # re i, the necessary release 
velocity is calculated by eliminating the variable t from Equations (3) and (4). The 
release angle (assumed to be at 135" so that re i = 45°) remains fixed through the 
experiment as part of the task strategy. Based on the desired release velocity and on 
the kinematics of the third link, a fifth order polynomial trajectory is computed for 
the arm that releases the ball with the appropriate velocity. 

Feedforward torques and a feedback controller drive the robot arm so that it follows 
the desired trajectory. The feedforward torques are calculated using the acceleration 



profile of the trajectory and the estimated inertia of the third link. A position- 
derivative feedback controller insures that the arm closely follows the desired trajectory 
on each throw. The control law that includes both feedforward and feedback torques 
is 

T = T ffwd - K p • {0 - 6 d ) - K v • (0 - 9 d ) (5) 

where K p and K v are the position and velocity feedback gains. 

The calculation of the initial trajectory duration has been based on an inverse 
model of the system. Given the distance and height between robot and target, the 
trajectory duration is computed. This model is only an approximation to the real 
system, and thus when the ball is initially thrown the actual performance usually 
differs from the desired performance. Many factors can cause this performance error 
including an inaccurate measurement of target location, inaccurate kinematic model, 
inaccurate vision system, inaccurate feedfoward torques, air resistance, torque satura- 
tion, arm command miscalibration, sensor miscalibration, and sensor noise, as well as 
errors in release angle, release velocity, actual trajectory duration, and others. 

The basis of task-level learning in this throwing experiment is to modify the top- 
level system command so that task performance improves. We present two general 
learning methods that improve system performance with practice. They both use the 
task performance error as well as the system model to refine the system command. 

2.1 Fixed-Model Learning 

In fixed-model learning the inverse model is used to estimate the correct commands 
based on the performance errors. A correction term, which we label A, is updated 
after each attempt at the task. In ball throwing, the error correction is updated after 
each throw by the amount the ball missed the target along the target plane. This 
measured error — whether positive or negative — updates A as a running sum 



*n+l 



A n - (hit n - target) (6) 



This error correction is added to the desired performance and transformed through 
the inverse model to calculate the new trajectory duration 



duration n+1 = S (target + A n+1 ) (7) 

A physical interpretation of Equation (7) is to consider the quantity (target + A n+1 ) 
to be the aim. In the case where the ball falls short, the error correction is positive, 
raising the aim by the amount of performance error. This action corresponds to our 



30 -i 



20 - 



S 

-S. 10 H 



CD 



-10 



DA 



DA 



o Fixed-Model 
a Refined-Model 



a 

-A- 



T 
3 



4 
Trial 



T 
5 



8 



Figure 5: Learning Performance 

intuition that we should aim higher if we are hitting too low. Together, Equations (6) 
and (7) provide the basis for fixed-model learning. 

Fixed-model learning was applied to the ball throwing task. The target was placed 
at a horizontal distance of 5.75 meters and a height of -0.9 meters from the robot. 
Using the ballistic and kinematic models of the robot system, a commanded trajectory 
duration of 138 milliseconds was calculated with Equation (7). (A is defined to be 
zero.) The robot threw the ball with this command, resulting in a performance error of 
28cm in the target plane. The error correction term was then calculated to be —28cm 
using Equation (6), and the commanded trajectory duration was refined to 143ms 
using Equation (7). On the next throw, the performance error was 22cm, and the 
error correction term was calculated to be —50cm. This iterative learning procedure 
was followed until the robot successfully completed the task. The trajectory duration 
for this successful throw was 156ms; the error correction was -97cm. The open boxes 
in Figure 5 show the performance improvement with practice. For the nth throw 
shown on the x-axis, the corresponding performance error is plotted on the y-axis. By 
applying fixed-model learning, the robot system successfully performed the task after 
the eighth iteration. 

It is important to generalize the fixed-model learning procedure so that it can be 
applied to other tasks. The system task commands are labeled with the vector c 
and the performance variables with the vector p. Generalizing the error correction 
equation to a multi-dimensional task, we write 



A n+1 = A n - (p n - p d ) 



(8) 



where Pd is the desired system performance. We also extend Equation (7) to 

Cn+^S-^pd + A^) (9) 

Fixed-model learning has successfully been applied to a multi-dimensional system in 
the case of trajectory following [Atkeson and Mclntyre 1986]. A similar approach has 
also been developed for the task of kinematic positioning at a visual target [Atkeson 
et al. 1987]. 

The convergence of fixed-model learning depends on the input/output behavior of 
both the model and system. The convergence criteria can be derived by using fixed 
point theory [Wang 1984; Wang and Horowitz 1985]. A learning algorithm can be 
viewed as a mapping of error corrections on the nth attempt to error corrections on 
the next attempt 

A n+1 = F(A n ) (10) 

The fixed-model learning algorithm can be put into this form by substituting Equa- 
tions (1) and (9) into Equation (8). Fixed-model learning modifies the nth error cor- 
rection term by adding an amount based on the predicted performance transformed 
by the actual system 

A n+1 = A n - SCS-^pd + A n )) + Pd (11) 

Note that when the correct command c* is achieved by using the right error correction 
term A*, then c* = S -1 (pd + A*). When the desired performance pd is achieved 
using this correct command c*, then pd = S(c*), and Equation (11) reduces to the 
fixed point A n+1 = A n = A*. 

We can ask whether this fixed point is stable by analyzing a linearization of Equa- 
tion (11) at the point (c,p) = (c*,pd). We begin by writing two equations for small 
perturbations around the fixed point. For perturbations Sc and Sp from the fixed 
point, 

S(c* + 6c) = p d + J(c*)6c (12) 

S-^pd + A' + SAn) = S-^pd + A^ + J-^pd + A^An (13) 

where J is the Jacobian matrix for the system S(), J is the Jacobian matrix for the 
model S(), and J is the Jacobian matrix for the inverse model S -1 (). To analyze 
the fixed point for stability, we consider the case in which the nth error correction 
term is perturbed from A* by 6A n so that A n = A* + SA n . The change in the 
next error correction term, <5A n+1 = A n+1 — A*, can be computed by substituting 
Equations (12) and (13) into Equation (11) 

6A n+1 = (I - J(c*) J-^Pd + A*))£A n (14) 
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Figure 6: Command-Performance Behavior 

Alternatively, we can obtain the error in the next command by multiplying through 
Equation (14) by J _1 (p d + A*) and noting that Sc n = J _1 (p d + A*)SA n 



Sc Q+1 = (I - J" l ( Pd + A*)J(c*))£c n 



(15) 



The matrix (I - J -1 J) indicates whether fixed- model learning will converge. When 
the model and system are both linear functions of their inputs, the matrix (I — J -1 J) 
provides global convergence criteria. The command error Sc will decrease when all 
the eigenvalues of the matrix (I - J -1 J) are less than one in absolute value, with the 
rate of decrease determined by the magnitude of the eigenvalues. If the magnitudes 
of all the eigenvalues are less than one, the learning process is stable and performance 
improves with practice. The magnitude of the eigenvalues of (I - J -1 J) depends on 
how accurately J -1 inverts J, and thus the stability of the learning algorithm depends 
on how closely the learning operator inverts the controlled system. This is a benefit 
of better modeling. 

In the general case where model and system are non-linear functions of their inputs, 
it is difficult to develop global convergence criteria. The criteria developed for the 
linear case can be applied locally, however, as a necessary but not sufficient condition 
for convergence. Thus, all the eigenvalues of the matrix (I - J -1 J) must be less than 
one in absolute value for learning to converge. If the magnitude of any eigenvalue 
is greater than one, fixed-model learning will almost certainly degrade performance. 
The better the model approximates the system, the closer the magnitudes will be to 
zero, and the more likely learning is to converge. 

We applied this local convergence criteria to the ball throwing system. Plots of the 
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Figure 7: Convergence of Fixed-Model Learning 

command-performance behavior of both the model and system are shown in Figure 6. 
The model and system were fit by linear functions: J was estimated to be -1.70 and 
J was estimated to be -5.44. The eigenvalue of (I — J -1 J) for this model and system 
was then calculated to be 0.69. This value is less than one, indicating that the ball 
throws are likely to converge to the target. The open boxes in Figure 5 demonstrate 
that the ball throws did in fact converge. 

The performance of the learning procedure refers to the rate of convergence. 
The geometric rate of convergence is given by the magnitudes of the eigenvalues of 
(I — J J). The best performance is achieved when all the eigenvalues are close to 
zero. In the ball throwing task, for example, the geometric rate of convergence was 
calculated to be 0.69. The small squares in Figure 7 illustrate this theoretical rate of 
convergence. The actual iterations of fixed-model learning, denoted by open boxes, 
closely approximate this prediction. As a model is made more accurate, the eigenval- 
ues of the matrix (I — J -1 J) approach zero leading to fast convergence and improved 
learning. 

2.2 Refined-Model Learning 

Refined-model learning refines the model as well as the command during practice. 
The refined-model approach essentially constructs a local linear model of the system 
from the last m + 1 attempts at the task, given a system of to inputs and m outputs. 
Thus, this method is an alternative to fixed-model learning only after the (m + l)th 
iteration. Once in use, refined-model learning sacrifices the original model structure 
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for a simpler local model. This local model is updated after each attempt at the task, 
leading to a refined system command. 

In ball throwing, refined-model learning was applied after two attempts at the 
task. The first throw with a trajectory duration of 138ms resulted in a performance 
error of 28cm. The second throw with a 143ms duration resulted in a performance 
error of 22cm. Refined-model learning linearly extrapolated between these two points, 
suggesting a throw with a trajectory duration of 160ms. Given the performance error, 

e n 

e n = hit n — target (16) 

the iteration rule for refined-model learning is 

en 

durational = duration n - ry^ r- (17) 

(en_! — e n )/(duration n _i — duration n ) 

The results of refined-model learning for the ball throwing task are given by the open 
triangles in Figure 5. The desired performance was reached in just five iterations. 

Refined-model learning can be generalized to multi-dimensional tasks. We first 
define the performance error e n to be the difference between actual and desired per- 
formance on the nth iteration 

e n = Pn - Pd (18) 

We next define AC and AE to be m x m matrices 

AC = [ c - c n ci - c n • • • c n _! - c n ] (19) 

AE = [ e - e n e x - e n • • • e n _ x - e n ] (20) 

where the A should not be confused with the error correction term in fixed-model 
learning. The general refined-model learning equation is then written as 

c n+1 =c n -AC(AE)- 1 e n (21) 

Refined-model learning in this form is similar to the secant method of finding zeros of 
functions [Gragg and Stewart 1976]. As such, improvements for avoiding singularities 
and for hastening convergence that are found in the numerical methods literature can 
be readily applied. Our primary interest here is to propose general learning procedures 
which can later be refined as they appear promising. 

Refined-model learning will converge only if the first attempts at the task are suffi- 
ciently near the desired performance and if the system function is sufficiently smooth 
in that neighborhood. Because of these two restrictions, a principled and conservative 
approach to extrapolation should be taken. In the case of a one-dimensional system, 
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Figure 8: Actual and Desired Arm Trajectories for the Final Throw 

for example, it might be useful to set a limit as to how far to believe an extrapolation. 
It might also be wise to interpolate instead of extrapolating as soon as a point on both 
sides of the desired performance is found [Press et al. 1986]. 

The performance of refined-model learning depends primarily on the input/output 
behavior of the system. The performance on the first learning iteration also depends 
on the accuracy of the internal model. A better model makes the original performance 
error smaller, making learning faster. In the case of a one-dimensional system, the 
performance error, en, will decay superlinearly. This convergence rate is faster than 
for fixed-model learning, which only converges geometrically [Forsythe et al. 1977]. 
Figure 5 demonstrates this performance advantage in the throwing task. 

3 Discussion 

The most important contribution of this work is to demonstrate that learning at the 
task-level can take place in the absence of perfect modeling at lower levels. Further, 
the top-level task is learned and not the individual modules that make up the system. 
In our ball-throwing robot system, the ballistic, kinematic, and dynamic models were 
never improved, whereas the task performance was. The desired trajectory of the 
robot arm was never perfectly followed, for example, but the task was accomplished. 
Figure 8 shows the desired velocity trajectory on the final throw that resulted in a 
perfect hit (zero error) as well as the actual velocity trajectory of the robot arm. The 
difference between the desired and actual trajectories is an indication that learning can 
proceed at the task level, even though lower level modules do not perform perfectly. 
Many important issues remain to be investigated further. One issue is how the 
repeatability of the robot system affects the learning process. Noise will affect both the 
performance and stability of each learning algorithm differently. Sensor and actuator 
noise was present in our robot system when throwing 5.75m, leading to an error range 
of about 0.04m in the target plane. As a result, it may be useful to filter the measured 
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performance errors or the calculated command corrections at the cost of reducing the 
rate of convergence. Both experimental and theoretical work needs to be done on this 
issue. 

While the system is learning a task, the model itself can be made more accurate. 
Refined-model learning takes this approach, as it builds a local linear model of the 
actual system to produce the next task command. The experience of all but the last 
two attempts at the task is ignored, however. Another potential approach is to build 
a local model on top of the structural model of the system as the task is practiced. 
A third approach is for the system to continually calibrate itself with respect to a 
task variable [Brooks et al. 1987]. With these more accurate models learning will be 
hastened. 

Learning at the lower level can sometimes be more stable. Instead of applying task- 
level learning to the system, learning can be applied to each module independently. 
This modular learning approach can guarantee convergence in certain cases where task- 
level learning actually degrades performance. The convergence rate and the robustness 
to noise in the intermediate sensors are two issues that must be investigated, however. 

One of the most striking aspects of human learning is that we generalize our ex- 
perience to related tasks. While we have developed some general learning procedures 
that apply to many tasks, some theoretical work is necessary to suggest when practice 
on one task can be generalized to another. 

4 Conclusions 

Learning from practice has been demonstrated to improve the performance of a fairly 
simple task — ball throwing. Task-level performance errors are reduced by actually 
attempting the task. The lower level component modules of the system can be largely 
ignored during this learning process. The two learning procedures are both an alter- 
native to extensive model calibration and a complement to more accurate modeling. 

The performance of many different tasks can be improved with fixed-model and 
refined-model learning. For a given task, the commands and performance can be 
defined, the system can be modeled, and a learning procedure can then be applied. 
We plan to demonstrate that the learning procedures developed here are applicable to 
complex tasks such as juggling. 

This research may also provide some insight into how people achieve everyday tasks. 
Actual implementation forces us to consider what internal models, computational 
resources, memory capacities, and learning procedures are necessary to achieve even 
the most routine tasks. 
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